STAT_DISTRIB
Overview
The STAT_DISTRIB function fits statistical distribution models to observed data using non-linear least squares optimization. This function leverages scipy.optimize.curve_fit to estimate the optimal parameters that minimize the sum of squared residuals between the model predictions and the observed data.
Statistical distributions are mathematical functions that describe the probability of occurrence of different possible outcomes. Fitting these distributions to empirical data is fundamental in reliability engineering, survival analysis, extreme value theory, and quality control. The function supports ten commonly used distributions:
- Gumbel Extreme Value PDF: Models the distribution of extreme values (maxima or minima), commonly used in hydrology and finance for modeling rare events.
- Laplace Double Exponential PDF: A symmetric distribution with heavier tails than the normal distribution, useful for modeling data with sharp peaks.
- Lognormal PDF/CDF: Models variables that are the product of many independent positive factors, widely used in finance and biology.
- Gamma CDF: A flexible family of distributions for modeling waiting times and reliability data.
- Poisson PMF: Models the count of discrete events occurring in a fixed interval, essential for queuing theory and epidemiology.
- Rayleigh CDF: Describes the magnitude of a two-dimensional vector with normally distributed components, used in signal processing.
- Weibull PDF/CDF: The standard distribution for failure analysis and reliability engineering, capable of modeling increasing, decreasing, or constant failure rates.
The optimization process minimizes the objective function:
\chi^2 = \sum_{i=1}^{n} \left( y_i - f(x_i; \boldsymbol{\theta}) \right)^2
where y_i represents the observed values, f(x_i; \boldsymbol{\theta}) is the model prediction at point x_i, and \boldsymbol{\theta} is the vector of parameters to be estimated. The function uses the Levenberg-Marquardt algorithm for unconstrained problems or Trust Region Reflective (TRF) method when parameter bounds are specified. For more details, see the SciPy optimization documentation and the SciPy GitHub repository.
The function returns fitted parameter values along with their estimated standard errors derived from the covariance matrix, enabling uncertainty quantification. Standard errors are computed as \sigma_i = \sqrt{\text{diag}(\mathbf{C})_i} where \mathbf{C} is the parameter covariance matrix.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=STAT_DISTRIB(xdata, ydata, stat_distrib_model)
xdata(list[list], required): The xdata valueydata(list[list], required): The ydata valuestat_distrib_model(str, required): The stat_distrib_model value
Returns (list[list]): 2D list [param_names, fitted_values, std_errors], or error string.
Examples
Example 1: Demo case 1
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| gumbel_extreme_value_pdf | 0.01 | 0.23590106220639281 |
| 2.0075 | 2.7357869830040222 | |
| 4.005 | 0.8174049411915324 | |
| 6.0024999999999995 | 0.08834598785743636 | |
| 8 | 0.039820406618528856 |
Excel formula:
=STAT_DISTRIB("gumbel_extreme_value_pdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.23590106220639281;2.7357869830040222;0.8174049411915324;0.08834598785743636;0.039820406618528856})
Expected output:
"non-error"
Example 2: Demo case 2
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| laplace_double_exponential_pdf | 0.01 | 0.037229680077038044 |
| 2.0075 | 0.23417471703877937 | |
| 4.005 | 0.14697460991010408 | |
| 6.0024999999999995 | 0.028238228287397343 | |
| 8 | 0.01 |
Excel formula:
=STAT_DISTRIB("laplace_double_exponential_pdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.037229680077038044;0.23417471703877937;0.14697460991010408;0.028238228287397343;0.01})
Expected output:
"non-error"
Example 3: Demo case 3
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| lognormal_probability_density | 0.01 | 0.5823323144343956 |
| 2.0075 | 0.3903845656866753 | |
| 4.005 | 0.28738567276996047 | |
| 6.0024999999999995 | 0.23775149983219734 | |
| 8 | 0.19017316235313045 |
Excel formula:
=STAT_DISTRIB("lognormal_probability_density", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.5823323144343956;0.3903845656866753;0.28738567276996047;0.23775149983219734;0.19017316235313045})
Expected output:
"non-error"
Example 4: Demo case 4
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| gamma_cumulative_distribution | 0.1 | 1 |
| 1.3250000000000002 | 1 | |
| 2.5500000000000003 | 1 | |
| 3.7750000000000004 | 1 | |
| 5 | 1 |
Excel formula:
=STAT_DISTRIB("gamma_cumulative_distribution", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {1;1;1;1;1})
Expected output:
"non-error"
Example 5: Demo case 5
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| gumbel_extreme_maxima_cdf | 0.01 | 0.08048585275638268 |
| 2.0075 | 0.38657945462078186 | |
| 4.005 | 0.9757390073660966 | |
| 6.0024999999999995 | 1.029304878230222 | |
| 8 | 0.9954946147896686 |
Excel formula:
=STAT_DISTRIB("gumbel_extreme_maxima_cdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.08048585275638268;0.38657945462078186;0.9757390073660966;1.029304878230222;0.9954946147896686})
Expected output:
"non-error"
Example 6: Demo case 6
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| lognormal_cumulative_distribution | 0.1 | 1 |
| 1.3250000000000002 | 1 | |
| 2.5500000000000003 | 1 | |
| 3.7750000000000004 | 1 | |
| 5 | 1 |
Excel formula:
=STAT_DISTRIB("lognormal_cumulative_distribution", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {1;1;1;1;1})
Expected output:
"non-error"
Example 7: Demo case 7
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| poisson_discrete_count_pmf | 0.01 | 1 |
| 2.0075 | 1 | |
| 4.005 | 1 | |
| 6.0024999999999995 | 1 | |
| 8 | 1 |
Excel formula:
=STAT_DISTRIB("poisson_discrete_count_pmf", {0.01;2.0075;4.005;6.0024999999999995;8}, {1;1;1;1;1})
Expected output:
"non-error"
Example 8: Demo case 8
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| rayleigh_magnitude_cdf | 0.01 | 0.01 |
| 2.0075 | 0.8365273106273831 | |
| 4.005 | 1.011899243812429 | |
| 6.0024999999999995 | 1.0296105430932765 | |
| 8 | 0.9954476090427379 |
Excel formula:
=STAT_DISTRIB("rayleigh_magnitude_cdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.01;0.8365273106273831;1.011899243812429;1.0296105430932765;0.9954476090427379})
Expected output:
"non-error"
Example 9: Demo case 9
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| weibull_failure_analysis_pdf | 0.01 | 0.005770566882927686 |
| 2.0075 | -0.0016062827938215303 | |
| 4.005 | 0.007524508826168242 | |
| 6.0024999999999995 | 0.04264036585516798 | |
| 8 | 0.583887968137886 |
Excel formula:
=STAT_DISTRIB("weibull_failure_analysis_pdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.005770566882927686;-0.0016062827938215303;0.007524508826168242;0.04264036585516798;0.583887968137886})
Expected output:
"non-error"
Example 10: Demo case 10
Inputs:
| stat_distrib_model | xdata | ydata |
|---|---|---|
| weibull_reliability_cdf | 0.01 | 0.031269365717061556 |
| 2.0075 | 1.4029550205119903 | |
| 4.005 | 2.1574344532424874 | |
| 6.0024999999999995 | 2.5385337350489827 | |
| 8 | 2.6109817879427717 |
Excel formula:
=STAT_DISTRIB("weibull_reliability_cdf", {0.01;2.0075;4.005;6.0024999999999995;8}, {0.031269365717061556;1.4029550205119903;2.1574344532424874;2.5385337350489827;2.6109817879427717})
Expected output:
"non-error"
Python Code
import numpy as np
from scipy.optimize import curve_fit as scipy_curve_fit
from scipy import special as sc
import math
def stat_distrib(xdata, ydata, stat_distrib_model):
"""
Fits stat_distrib models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
This example function is provided as-is without any representation of accuracy.
Args:
xdata (list[list]): The xdata value
ydata (list[list]): The ydata value
stat_distrib_model (str): The stat_distrib_model value Valid options: Gumbel Extreme Value Pdf, Laplace Double Exponential Pdf, Lognormal Probability Density, Gamma Cumulative Distribution, Gumbel Extreme Maxima Cdf, Lognormal Cumulative Distribution, Poisson Discrete Count Pmf, Rayleigh Magnitude Cdf, Weibull Failure Analysis Pdf, Weibull Reliability Cdf.
Returns:
list[list]: 2D list [param_names, fitted_values, std_errors], or error string.
"""
def _validate_data(xdata, ydata):
"""Validate and convert both xdata and ydata to numpy arrays."""
for name, arg in [("xdata", xdata), ("ydata", ydata)]:
if not isinstance(arg, list) or len(arg) < 2:
raise ValueError(f"{name}: must be a 2D list with at least two rows")
vals = []
for i, row in enumerate(arg):
if not isinstance(row, list) or len(row) == 0:
raise ValueError(f"{name} row {i}: must be a non-empty list")
try:
vals.append(float(row[0]))
except Exception:
raise ValueError(f"{name} row {i}: non-numeric value")
if name == "xdata":
x_arr = np.asarray(vals, dtype=np.float64)
else:
y_arr = np.asarray(vals, dtype=np.float64)
if x_arr.shape[0] != y_arr.shape[0]:
raise ValueError("xdata and ydata must have the same number of rows")
return x_arr, y_arr
# Model definitions dictionary
models = {
'gumbel_extreme_value_pdf': {
'params': ['y0', 'xc', 'w', 'A'],
'model': lambda x, y0, xc, w, A: y0 + A * np.exp(-np.exp(-(x - xc) / w) - (x - xc) / w + 1.0),
'guess': lambda xa, ya: (float(np.min(ya)), float(np.median(xa)), float(max((np.max(xa) - np.min(xa)) / 6.0, 1e-3)), float(np.ptp(ya) if np.ptp(ya) else 1.0)),
'bounds': ([-np.inf, -np.inf, 0.0, -np.inf], np.inf),
},
'laplace_double_exponential_pdf': {
'params': ['y0', 'a', 'b'],
'model': lambda x, y0, a, b: y0 + 0.5 / b * np.exp(-np.abs(x - a) / b),
'guess': lambda xa, ya: (float(np.min(ya)), float(np.median(xa)), float(max(np.std(xa), 1e-3))),
'bounds': ([-np.inf, -np.inf, 0.0], np.inf),
},
'lognormal_probability_density': {
'params': ['A', 'mu', 'sigma'],
'model': lambda x, A, mu, sigma: A * np.exp(-np.power(np.log(x) - mu, 2) / (2.0 * sigma * sigma)) / x,
'guess': lambda xa, ya: (float(max(ya)), float(np.mean(np.log(xa[xa > 0])) if np.any(xa > 0) else 1.0), 1.0),
'bounds': ([-np.inf, -np.inf, 0.0], np.inf),
},
'gamma_cumulative_distribution': {
'params': ['y0', 'A1', 'a', 'b'],
'model': lambda x, y0, A1, a, b: y0 + A1 * sc.gammainc(a, np.clip(x, 0.0, None) / b),
'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), 2.0, 1.0),
'bounds': ([-np.inf, 0.0, 0.0, 0.0], np.inf),
},
'gumbel_extreme_maxima_cdf': {
'params': ['a', 'b'],
'model': lambda x, a, b: 1.0 - np.exp(-np.exp((x - a) / b)),
'guess': lambda xa, ya: (float(np.median(xa)), 1.0),
'bounds': ([-np.inf, 0.0], np.inf),
},
'lognormal_cumulative_distribution': {
'params': ['y0', 'A1', 'xc', 'w'],
'model': lambda x, y0, A1, xc, w: np.where(x > 0, y0 + A1 * 0.5 * (1.0 + sc.erf((np.log(x) - xc) / (w * np.sqrt(2.0)))), y0),
'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), float(np.log(np.median(xa[xa > 0])) if np.any(xa > 0) else 0.0), 1.0),
'bounds': ([-np.inf, 0.0, -np.inf, 0.0], np.inf),
},
'poisson_discrete_count_pmf': {
'params': ['A', 'lam'],
'model': lambda x, A, lam: A * (np.power(lam, x) * np.exp(-lam) / sc.gamma(x + 1.0)),
'guess': lambda xa, ya: (float(max(ya)), float(np.mean(xa))),
'bounds': ([-np.inf, 0.0], np.inf),
},
'rayleigh_magnitude_cdf': {
'params': ['b'],
'model': lambda x, b: 1.0 - np.exp(-np.square(x) / (2.0 * np.square(b))),
'guess': lambda xa, ya: (float(np.std(xa)),),
'bounds': (0.0, np.inf),
},
'weibull_failure_analysis_pdf': {
'params': ['y0', 'a', 'r', 'u'],
'model': lambda x, y0, a, r, u: y0 + (r / a) * np.power(np.clip(x - u, 0.0, None) / a, r - 1.0) * np.exp(-np.power(np.clip(x - u, 0.0, None) / a, r)),
'guess': lambda xa, ya: (float(np.min(ya)), float(np.std(xa) if np.std(xa) else 1.0), 1.5, float(np.min(xa))),
'bounds': ([-np.inf, 0.0, 0.0, -np.inf], np.inf),
},
'weibull_reliability_cdf': {
'params': ['y0', 'A1', 'a', 'b'],
'model': lambda x, y0, A1, a, b: np.where(x > 0, y0 + A1 * (1.0 - np.exp(-np.power(x / a, b))), y0),
'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), 1.0, 1.5),
'bounds': ([-np.inf, 0.0, 0.0, 0.0], np.inf),
}
}
# Validate model parameter
if stat_distrib_model not in models:
return f"Invalid model: {str(stat_distrib_model)}. Valid models are: {', '.join(models.keys())}"
model_info = models[stat_distrib_model]
# Validate and convert input data
try:
x_arr, y_arr = _validate_data(xdata, ydata)
except ValueError as e:
return f"Invalid input: {e}"
# Perform curve fitting
try:
p0 = model_info['guess'](x_arr, y_arr)
bounds = model_info.get('bounds', (-np.inf, np.inf))
if bounds == (-np.inf, np.inf):
popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, maxfev=10000)
else:
popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, bounds=bounds, maxfev=10000)
fitted_vals = [float(v) for v in popt]
for v in fitted_vals:
if math.isnan(v) or math.isinf(v):
return "Fitting produced invalid numeric values (NaN or inf)."
except ValueError as e:
return f"Initial guess error: {e}"
except Exception as e:
return f"curve_fit error: {e}"
# Calculate standard errors
std_errors = None
try:
if pcov is not None and np.isfinite(pcov).all():
std_errors = [float(v) for v in np.sqrt(np.diag(pcov))]
except Exception:
pass
return [model_info['params'], fitted_vals, std_errors] if std_errors else [model_info['params'], fitted_vals]